Efficient Code Optimization Technique for Itanium2 Cache System and Scientific Computing
نویسندگان
چکیده
To keep up with a large degree of ILP, Itanium2 L2 cache system uses a complex organization scheme: load/store queues, banking and interleaving. In this paper, we study the impact of this cache system on memory instruction scheduling. We demonstrate that for scientific codes, “memory access vectorization” allows to generate very efficient code (up to the maximum of 4 loads per cycle). The impact of such “vectorization” on register pressure is analyzed: various register allocation schemes are proposed and evaluated.
منابع مشابه
Optimizing Performance on Modern HPC Systems: Learning From Simple Kernel Benchmarks
We discuss basic optimization and parallelization strategies for current cache-based microprocessors (Intel Itanium2, Intel Netburst and AMD64 variants) in single-CPU and shared memory environments. Using selected kernel benchmarks representing data intensive applications we focus on the effective bandwidths attainable, which is still suboptimal using current compilers. We stress the need for a...
متن کاملCombinatorial Problems in Compiler Optimization
Several important compiler optimizations such as instruction scheduling and register allocation are fundamentally hard and are usually solved using heuristics or approximate solutions. In contrast, this thesis examines optimal solutions to three combinatorial problems in compiler optimization. The first problem addresses instruction scheduling for clustered architectures, popular in embedded sy...
متن کاملAn efficient memory operations optimization technique for vector loops on Itanium 2 processors
To keep up with a large degree of instruction level parallelism (ILP), the Itanium 2 cache systems use a complex organization scheme: load/store queues, banking and interleaving. In this paper, we study the impact of these cache systems on memory instructions scheduling. We demonstrate that, if no care is taken at compile time, the non-precise memory disambiguation mechanism and the banking str...
متن کاملSECTORED CACHE PERFORMANCE EVALUATION: A case study on the KSR-1 data subcache
Sectoring is a cache design and management technique that is re-emerging as cache sizes get larger and computer designers strive to exploit the possible gains from using large block (line) sizes due to spatial locality. Sectoring allows for a small tag array size to suffice retaining address tags only for the large blocks, but still avoids huge miss penalties by utilizing a smaller transfer siz...
متن کاملA Stable and Efficient Loop Tiling Algorithm
Loop tiling is an effective optimizing transformation to boost the memory performance of a program, especially for dense matrix scientific computations. The magnitude and stability of the achieved performance improvements is heavily dependent on the appropriate selection of tile sizes. Many existing tile selection algorithms try to find tile sizes which eliminate self-interference cache conflic...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2007